Scientific Python antipatterns advent calendar day ten

For today, a kind of follow up to yesterday’s post. As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

and I’ll send a single email at the end with links to them all.

Using `print` and `return` for error handling

In yesterday’s post we looked at why we generally try to avoid returning a special value from a function if we don’t have to. But what if our function runs into a situation that it can’t handle? Imagine we have a tiny function that takes an email address, splits it into two parts - the username and the domain name - and returns just the domain name:

def get_domain(email_address):

    # split and take the last element of the split
    domain = email_address.split('@')[-1]
    return domain

get_domain('martin@pythonforbiologists.com'), get_domain('billgates@microsoft.com')

('pythonforbiologists.com', 'microsoft.com')

How will this function handle incorrect inputs? Some will generate an error automatically:

get_domain(42)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[10], line 1
----> 1 get_domain(42)

Cell In[9], line 4, in get_domain(email_address)
      1 def get_domain(email_address):
      2 
      3     # split and take the last element of the split
----> 4     domain = email_address.split('@')[-1]
      5     return domain

AttributeError: 'int' object has no attribute 'split'

but some will give incorrect output:

get_domain('martin_at_pythonforbiologists.com')

'martin_at_pythonforbiologists.com'

We would like to avoid this, as it will cause problems later on. We can easily add some error checking:

def get_domain(email_address):
    if '@' in email_address:
        domain = email_address.split('@')[-1]
        return domain
        
get_domain('martin@pythonforbiologists.com')

'pythonforbiologists.com'

What happens now with an invalid input?

result = get_domain('martin_at_pythonforbiologists.com')
print(result)

None

Our function never hits a return, so we get the special value None. We will only notice this if we print the result; otherwise we might end up using it:

result = get_domain('martin_at_pythonforbiologists.com')
result == 'pythonforbiologists.com'

False

and never noticing that the function did not do anything.

Beginners often try to fix this by adding some debugging messages with print:

def get_domain(email_address):
    if '@' in email_address:
        domain = email_address.split('@')[-1]
        return domain
    else:
        print('ERROR: email address must contain an @ character!')

get_domain('martin_at_pythonforbiologists.com')

ERROR: email address must contain an @ character!

but this doesn’t stop us from accidentally storing the result and using it just like before:

result = get_domain('martin_at_pythonforbiologists.com')
result == 'pythonforbiologists.com'

ERROR: email address must contain an @ character!

False

we have to hope that we notice the printed error message. Even worse is to return the error message:

def get_domain(email_address):
    if '@' in email_address:
        domain = email_address.split('@')[-1]
        return domain
    else:
        return('ERROR: email address must contain an @ character!')

Now if we use our function to process a batch of email addresses:

email_addresses = [
    'martin@pythonforbiologists.com',
    'alice',
    'bob@gmail.com',
    'billg_at_microsoft.com'
]

domains = []
for email_address in email_addresses:
    domains.append(get_domain(email_address))

we are left with a list of strings that contains a mixture of valid domains and error messages:

domains

['pythonforbiologists.com',
 'ERROR: email address must contain an @ character!',
 'gmail.com',
 'ERROR: email address must contain an @ character!']

which will be very hard to deal with.

So what’s the right way to deal with this? Use Python’s built in exception system to signal the error:

def get_domain(email_address):
    if '@' in email_address:
        domain = email_address.split('@')[-1]
        return domain
    else:
        raise ValueError('ERROR: email address must contain an @ character!')

This makes no difference to the behaviour for valid inputs:

get_domain('martin@pythonforbiologists.com')

'pythonforbiologists.com'

but immediately triggers a crash on invalid inputs:

get_domain('martin_at_pythonforbiologists.com')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 get_domain('martin_at_pythonforbiologists.com')

Cell In[23], line 6, in get_domain(email_address)
      4     return domain
      5 else:
----> 6     raise ValueError('ERROR: email address must contain an @ character!')

ValueError: ERROR: email address must contain an @ character!

Now it is inpossible for us to accidentally use the return value, as the function never returns (it crashes instead). And it’s impossible for us to ignore the error! We now have a function that can only ever do one of two things:

return the correct output for a valid input
crash on an invalid input

and using the rest of Python’s exception system (which is too long an explanation for an advent calendar post!) we will easily be able to decide how to handle errors.

Bonus: by convention in Python - and because it’s generally clearer - we try to put error-checking code at the start of the function so that we can easily skip it when reading the code and trying to understand the expected behaviour:

def get_domain(email_address):
    
    if '@' not in email_address:
        raise ValueError('ERROR: email address must contain an @ character!')    
        
    domain = email_address.split('@')[-1]
    return domain

This also avoids the need for a separate else brach, leaving the code even cleaner.

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Using print and return for error handling

Using `print` and `return` for error handling